1 Introduction

GitHub

Music has an incredibly fascinating effect on people. Humans are one of the few, if not the only, animals that can naturally feel rhythms. When music is played, it only takes moments for people to clap along to the beat of the song. Specific genres and styles of music have defined generations and cultures time and time again. It’s only natural for music to be so powerful for people.

This notion has picked up much attention in the media. In fact, ABC Science released a video a few years ago ( watch it here ). discussing the power of music on the brain, specifically for people with dementia and Parkinson’s. In the video, researchers showed how playing music reminiscent of one’s past can trigger memories, even for someone who has forgotten them. They did so by curating specially tailored playlists from the pasts of certain residents at a nursing home. When the dementia patients listened to these playlists, many of them had a sudden shift in mood. Their family members were so surprised to see them “come back to life,” in a sense. These residents would talk about their past and be more open and happy than they had been in months.

Another segment of the video showed how music can help with diseases that affect the motor system, like Parkinson’s. At a human movement lab, a professor has spent years analyzing and trying to solve these movement disorders. One thing she tried was playing music, and it’s incredible to see the change that overcomes her patients. John, one of her patients, suffers from Parkinson’s. The video shows how the debilitating disease limits his natural motor skills and prevents him from having voluntary movements. However, after applying a nice tune with the professor’s music, John is able to not only walk, but dance with a partner.

The last part of the video follows a man named Shane who suffered severe brain damage from a bike accident. Following the accident, he could barely speak, move, and recall anything. Later, he became part of an experiment to see if music could invoke memories for people who had severe memory loss from brain injuries. The results showed that he did as well as people with normal, healthy brains, despite the accident. For example, Shane had trouble recalling a memory from grade school if he was just asked to, but had no issues doing so if a song from grade school was being played.

This video illustrates not only the power in music, but its prevalence in our natural being. It is much more than a form of entertainment, it is part of a complete life. The rhythm and tunes of songs – whether from traditional instruments, natural sounds, or digital playlists – bury themselves in our brains and become an integral part of human life.

Additionally, music is not immune to the various societal trends that can morph and change over time. Since it is a form of art, it generally reflects on what major feelings and emotions are being spread in a given time. The prevalence of music in our lives combined with the artistic essence makes it a strong vantage point when looking at how society changes the way it expresses itself. For example, there are times in the late 1940’s and 1950’s where songs tend to be filled with sad or melancholic lyrics, making for more of a “low-mood” song. This has much to do with the major wars at the time, namely World War II and the subsequent Cold War.

Other trends in music, and art in general, can illuminate these trends. Seeing the trends in how artists express their views of the world provides a look into life at during a given time period. Furthermore, the advantageous point with music is also that the popularity of a song or genre can be tracked. Music therefore not only gives a perspective on what songs and emotions are being produced in a given time, but to what extent each is consumed. So, it is possible to look at both the expressed emotions in music, and how much it resonates with listeners.

Change in music is not limited to different life periods, but is also present in different phases in one’s life. For example, there is a popular practice for pregnant mothers to play classical music for their babies in hopes of making them more intelligent. In other phases of life, teens may be warned to stay away from certain genres of music because it will “rot their brains” or act as a bad influence. In either case, music enters our lives early. Many young musical prodigies are discovered because they begin reacting to music even before they are able to speak or walk.

Our plan for this study relies on a comprehensive data set supplied by the Spotify API. This data encompasses many interesting variables, and there is much to experiment with. It provides a fertile ground for exploration into how music affects humans. Two variables specifically are eye-catching: valence, a measure of the happiness of a song, and explicitness, whether a song contains explicit lyrics or not. There are also variables regarding year published, popularity, acousticness, etc. There is a perceived trend that current music (2010-2020) is becoming more and more explicit and despairing. In fact, this can be seen in Figure 1.1 with a dip in valence after 1975. It may be possible to look at these different characteristics in acousticness, valence, and explicitness to create a model that predicts the popularity of music.

Scatterplot of Spotify's valence score by year

Figure 1.1: Scatterplot of Spotify’s valence score by year

In addition to a study of trends in music consumption, this study aims to take a closer look at correlations between the music dataset and other datasets in mental health and crime. Is it possible that the prevalence in explicit music is rising with crime rates? Is it possible that the decrease in valence score is correlated with an increase in mental health cases? These questions will be discussed in the subsequent sections.

2 Data

The datasets used in this project required lots of work to clean and prepare for visualization and analysis. The data cleaning and code is outlines in the ATABAS_HUANG_dataCleaning.Rmd file, with short descriptions attached for each cleaning procedure.

Music: The music dataset is made available through the Spotify API. Spotify is an audio streaming service that was launched in 2008 and is now one of the most popular streaming platforms for music and podcasts. Therefore, the dataset is fairly comprehensive and contains lots of information due to Spotify’s status as one of the most popular sites for music consumption.

The data encompasses many interesting variables. Of course there are descriptors such as artist or band name, year of song, genre, and key, but there are many other metrics that require more attention.

These specially catered metrics usually range from 0 to 1. For example Spotify describes the metric for the valence (happiness) of a song as follows: “A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry)”.

Other metrics such as explicitness are described as follows: “Whether or not the episode has explicit content (true = yes it does; false = no it does not OR unknown)”. Whereas danceability is described as: “Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable”.

Crime:

The United Nations Office on Drugs and Crime (UNODC) has many comprehensive datasets that pertain to crime of all different shapes and forms. The organization, as a subset of the United Nations, works to cover pressing concerns related to crime in general, but they focus on tackling and/or preventing crime. The datasets present on their website are grouped by type of crime with subsets for more details. For example, there are differentiations made between sexual violence, homicide, and assault. Further, assault can be analyzed more specifically through the different mechanisms (firearm, sharp object, serious, etc.)

With a dataset of this comprehensive nature, it is possible to make different comparisons by extracting the parts of the data that are of interest. In this case, it was of interest to study the relationship between trends in crime and trends in explicit music. There are many studies that have shown types of music to have a postivie impact (classical music, joyous music). However, there are also claims that media in the form of video games, TV, or music can have an equally negative impact depending on their contents. Along these lines, it may be interesting to explore any relationships between explicit music and crime.

The limitations to this dataset originate more from the collection process. The data are collected from the annual United Nations crime trends survey. This is conducted by the UNODC and takes responses from the government officials in a given country. This method not only relies on countries reporting to the U.N., but also relies on local authorities within countries like the U.S. and their reports to the federal level. There may be differences in reporting crime when compared to different countries, because each country can have different laws regarding different crimes. Even states or counties within a given country can have different guidelines for recording a crime. This should be kept in consideration when drawing conclusions.

Since all of these datasets were provided in separate files, it was necessary to read the excel files separately and then follow with a merge by year.

The data also comes in a format that is not easy to work with, having a column of rate and count for each year (i.e. rate of crime, count of crime). The years available are from 2010 to 2017.

The data from UNODC will be used to study

World Happiness:

The first World Happiness Report was released on April 1, 2012 as a foundational text for the UN High Level Meeting: Well-being and Happiness: Defining a New Economic Paradigm, drawing international attention. The report outlined the state of world happiness, causes of happiness and misery, and policy implications highlighted by case studies. In 2013, the second World Happiness Report was issued, and since then has been issued on an annual basis with the exception of 2014. The report primarily uses data from the Gallup World Poll.

The rankings of national happiness are based on a Cantril ladder survey. Nationally representative samples of respondents are asked to think of a ladder, with the best possible life for them being a 10, and the worst possible life being a 0. They are then asked to rate their own current lives on that 0 to 10 scale. The report correlates the results with various life factors.

In the reports, experts in fields including economics, psychology, survey analysis, and national statistics, describe how measurements of well-being can be used effectively to assess the progress of nations, and other topics. Each report is organized by chapters that delve deeper into issues relating to happiness, including mental illness, the objective benefits of happiness, the importance of ethics, policy implications, and links with the Organization for Economic Co-operation and Development’s (OECD) approach to measuring subjective well-being and other international and national efforts.

Data is collected from people in over 150 countries. Each variable measured reveals a populated-weighted average score on a scale running from 0 to 10 that is tracked over time and compared against other countries. These variables currently include:

Each country is also compared against a hypothetical nation called Dystopia. Dystopia represents the lowest national averages for each key variable and is, along with residual error, used as a regression benchmark. The six metrics are used to explain the estimated extent to which each of these factors contribute to increasing life satisfaction when compared to the hypothetical nation of Dystopia, but they themselves do not have an impact on the total score reported for each country

In the above code, I have changed the variable names in each dataset so they should all (mostly) match with each other. Then, I combined all the datasets into one big “happiness” dataset.

3 Exploration

3.1 Spotify

There are some natural thoughts that may come up when thinking about what two characteristics in music may support each other, or on the other hand, go against each other. For example, one may think that acousticness and energy of a song are not necessarily compatible traits, however danceability and loudness may hold promise in a positive correlation. The pairs plot in Figure 3.1 explores these relationships and correlations.

Pairs plot for the variables in the Spotify dataset

Figure 3.1: Pairs plot for the variables in the Spotify dataset

By looking at the pairs plot, it is possible to see that acousticness and energy are inversely related, and pretty strong at that (refer to plot with 3.1, r = -0.967***). There are many other relationships to explore later when building a model. However, another promising relationships is between the loudness and popularity parameters. The songs that are louder also have a tendancy to be more popular.

Shiny applications not supported in static R Markdown documents

3.2 Happiness

Before we begin combining the Spotify dataset with other datasets, it’s important to look at those datasets individually. For example, how does the World Happiness dataset change over each year? The world map below, in Figure 3.2, is an interactive map that displays the change in world happiness in countries over the years from \(2015\) to \(2020\). (Hover over countries to see more specific information.)

Figure 3.2: Map of World Happiness from 2015 - 2020

The world map above, in Figure 3.2, is an interactive map that displays the change in world happiness in countries over the years from \(2015\) to \(2020\). (Hover over countries to see more specific information.)

Density Ridges Plot of Happiness by Region

Figure 3.3: Density Ridges Plot of Happiness by Region

The plot above shows a density plot of the happiness scores across different regions of the world. Note that some countries are included twice (ex: in Southeastern Asia AND in South Asia), as well as the fact that \(2017\), \(2018\), and \(2019\) do not split countries into regions, so the happiness score is not plotted for those years. We can see that the happiest regions appear to be North America and Australia and New Zealand.

World Happiness from 2015 - 2020

Figure 3.4: World Happiness from 2015 - 2020

Figure 3.4 shows the mean happiness of the world over time. The range is typically from \(0\) to \(10\), where \(10\) would be the happiest, so it’s important to note that while there is an increase in happiness after \(2017\), the average only goes up by about \(0.125\) points.

3.3 Crime

Continuing with the exploration, it is time to look at the crime data. The crime data comes from the United Nations Office on Drugs and Crime and depends on reportings from each country. This can create a discrepancy because each country has different considerations and guidelines on crime which will dictate if a specific incident enters their records or not.

Withstanding these limitations, it is still beneficial to look at what is made available at hand. The shiny app allows the viewing of country assault rates, filtered by regions.

Shiny applications not supported in static R Markdown documents

The shiny app makes it possible to see trends in countries of certain regions and how they compare to each other. For example, it is clear that Hungary and Czech Republic have the highest rates of assault in the Eastern European regions. This type of observation can be made with a few other countries as well. Each region has a separate country that stands out.

Also, the lines through the points show that not all countries have data for every year. This can be a limitation, however, the methods used later will make use of the general trends and not data per country.

Even more, however, it is possible to see that there have not been many changes in rates over time. For example, the average rates as shown in Figure 3.5 shows that the rates have not changed greatly over the years.

Robbery below… (still needs work)

Rates of Crime by Year and Region

Figure 3.5: Rates of Crime by Year and Region

4 Comparing Datasets/Variables

4.1 Valence and Happiness

(This is the same EDA graph from earlier, but we limited the years to just \(2015\) to \(2020\) because these are the only years available in our World Happiness dataset.)

## 
## Call:
## lm(formula = mean ~ valence, data = spotify_and_happiness)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0262363 -0.0009372  0.0042383  0.0074585  0.0114632 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 4.780619   0.006669  716.85   <2e-16 ***
## valence     1.367459   0.014866   91.99   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01231 on 932 degrees of freedom
## Multiple R-squared:  0.9008, Adjusted R-squared:  0.9007 
## F-statistic:  8462 on 1 and 932 DF,  p-value: < 2.2e-16

(We combined the Spotify and World Happiness datasets by year and fit a linear regression model. We didn’t check the assumptions because there are only 6 points. This makes the two variables seem very highly correlated, when in reality, they may not be.)

4.2 Crime and Explicitness

Comparing rate of crime (for 100,000 population) and percent explicit music per year for 2010-2017, displayed by region for each type of crime: assault, robbery, and sexual violence

Figure 4.1: Comparing rate of crime (for 100,000 population) and percent explicit music per year for 2010-2017, displayed by region for each type of crime: assault, robbery, and sexual violence

5 Conclusions & Discussions

## Rows: 5
## Columns: 19
## $ valence          <dbl> 0.420, 0.673, 0.359, 0.935, 0.715
## $ year             <int> 1984, 2013, 1970, 1980, 1990
## $ acousticness     <dbl> 0.615000, 0.085000, 0.125000, 0.000476, 0.003210
## $ artists          <chr> "[\"Singin' In The Rain - Original Cast\"]", "['One …
## $ danceability     <dbl> 0.457, 0.675, 0.399, 0.744, 0.761
## $ duration_ms      <int> 213600, 179413, 297929, 217693, 224173
## $ energy           <dbl> 0.247, 0.874, 0.845, 0.569, 0.715
## $ explicit         <int> 0, 0, 0, 0, 0
## $ id               <chr> "2mKbaww1PC6Gff5kZnuFiJ", "3aGJClg6klBoSa0UZnXeMM", …
## $ instrumentalness <dbl> 0.00218, 0.00000, 0.00109, 0.67000, 0.09890
## $ key              <int> 2, 3, 2, 4, 10
## $ liveness         <dbl> 0.1380, 0.0573, 0.3340, 0.2620, 0.0484
## $ loudness         <dbl> -16.998, -4.268, -8.736, -12.589, -11.699
## $ mode             <int> 1, 1, 1, 0, 0
## $ name             <chr> "Singin' in the Rain", "Does He Know?", "Gallows Pol…
## $ popularity       <int> 33, 63, 36, 32, 44
## $ release_date     <chr> "1984-12-31", "2013-11-25", "1970-10-05", "1980", "1…
## $ speechiness      <dbl> 0.0359, 0.0494, 0.0369, 0.0326, 0.0669
## $ tempo            <dbl> 144.548, 139.021, 105.911, 131.764, 101.132

(This will depend on what happens after we look at valence & world happiness and crime & explicitness.)

The main purpose of the conclusion will also be to re-evaluate the comparisons while drawing from the EDA to guide a discussion.